Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface
نویسندگان
چکیده
Recent improvements are presented for phonetic decoding of continuous-speech from ultrasound and optical observations of the tongue and lips in a silent speech interface application. In a new approach to this critical step, the visual streams are modeled by context-dependent multi-stream Hidden Markov Models (CD-MSHMM). Results are compared to a baseline system using context-independent modeling and a visual feature fusion strategy, with both systems evaluated on a onehour, phonetically balanced English speech database. Tongue and lip images are coded using PCA-based feature extraction techniques. The uttered speech signal, also recorded, is used to initialize the training of the visual HMMs. Visual phonetic decoding performance is evaluated successively with and without the help of linguistic constraints introduced via a 2.5kword decoding dictionary.
منابع مشابه
Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips
This article presents a framework for a phonetic vocoder driven by ultrasound and optical images of the tongue and lips for a “silent speech interface” application. The system is built around an HMM-based visual phone recognition step which provides target phonetic sequences from a continuous visual observation stream. The phonetic target constrains the search for the optimal sequence of diphon...
متن کاملDevelopment of a silent speech interface driven by ultrasound and optical images of the tongue and lips
This article presents a segmental vocoder driven by ultrasound and optical images (standard CCD camera) of the tongue and lips for a “silent speech interface” application, usable either by a laryngectomized patient or for silent communication. The system is built around an audio–visual dictionary which associates visual to acoustic observations for each phonetic class. Visual features are extra...
متن کاملPhone recognition from ultrasound and optical video sequences for a silent speech interface
Latest results on continuous speech phone recognition from video observations of the tongue and lips are described in the context of an ultrasound-based silent speech interface. The study is based on a new 61-minute audiovisual database containing ultrasound sequences of the tongue as well as both frontal and lateral view of the speaker’s lips. Phonetically balanced and exhibiting good diphone ...
متن کاملAutomatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression
This paper presents a method for automatically animating the articulatory tongue model of a reference speaker from ultrasound images of the tongue of another speaker. This work is developed in the context of speech therapy based on visual biofeedback, where a speaker is provided with visual information about his/her own articulation. In our approach, the feedback is delivered via an articulator...
متن کاملContinuous Articulatory-to-Acoustic Mapping using Phone-based Trajectory HMM for a Silent Speech Interface
The article presents an HMM-based mapping approach for converting ultrasound and video images of the vocal tract into an audible speech signal, for a silent speech interface application. The proposed technique is based on the joint modeling of articulatory and spectral features, for each phonetic class, using Hidden Markov Models (HMM) and multivariate Gaussian distributions with full covarianc...
متن کامل